AITopics | benign model

Collaborating Authors

benign model

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

MARS: A Malignity-Aware Backdoor Defense in Federated Learning

Wan, Wei, Ning, Yuxuan, Huang, Zhicong, Hong, Cheng, Hu, Shengshan, Zhou, Ziqi, Zhang, Yechao, Zhu, Tianqing, Zhou, Wanlei, Zhang, Leo Yu

arXiv.org Artificial IntelligenceSep-26-2025

Federated Learning (FL) is a distributed paradigm aimed at protecting participant data privacy by exchanging model parameters to achieve high-quality model training. However, this distributed nature also makes FL highly vulnerable to backdoor attacks. Notably, the recently proposed state-of-the-art (SOTA) attack, 3DFed (SP2023), uses an indicator mechanism to determine whether the backdoor models have been accepted by the defender and adaptively optimizes backdoor models, rendering existing defenses ineffective. In this paper, we first reveal that the failure of existing defenses lies in the employment of empirical statistical measures that are loosely coupled with backdoor attacks. Motivated by this, we propose a Malignity-Aware backdooR defenSe (MARS) that leverages backdoor energy (BE) to indicate the malicious extent of each neuron. To amplify malignity, we further extract the most prominent BE values from each model to form a concentrated backdoor energy (CBE). Finally, a novel Wasserstein distance-based clustering method is introduced to effectively identify backdoor models. Extensive experiments demonstrate that MARS can defend against SOTA backdoor attacks and significantly outperforms existing defenses.

artificial intelligence, machine learning, mars, (18 more...)

arXiv.org Artificial Intelligence

2509.20383

Genre: Research Report > Experimental Study (1.00)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.93)

Add feedback

PickleBall: Secure Deserialization of Pickle-based Machine Learning Models (Extended Report)

Kellas, Andreas D., Christou, Neophytos, Jiang, Wenxin, Li, Penghui, Simon, Laurent, David, Yaniv, Kemerlis, Vasileios P., Davis, James C., Yang, Junfeng

arXiv.org Artificial IntelligenceSep-22-2025

Machine learning model repositories such as the Hugging Face Model Hub facilitate model exchanges. However, bad actors can deliver malware through compromised models. Existing defenses such as safer model formats, restrictive (but inflexible) loading policies, and model scanners have shortcomings: 44.9% of popular models on Hugging Face still use the insecure pickle format, 15% of these cannot be loaded by restrictive loading policies, and model scanners have both false positives and false negatives. Pickle remains the de facto standard for model exchange, and the ML community lacks a tool that offers transparent safe loading. We present PickleBall to help machine learning engineers load pickle-based models safely. PickleBall statically analyzes the source code of a given machine learning library and computes a custom policy that specifies a safe load-time behavior for benign models. PickleBall then dynamically enforces the policy during load time as a drop-in replacement for the pickle module. PickleBall generates policies that correctly load 79.8% of benign pickle-based models in our dataset, while rejecting all (100%) malicious examples in our dataset. In comparison, evaluated model scanners fail to identify known malicious models, and the state-of-art loader loads 22% fewer benign models than PickleBall. PickleBall removes the threat of arbitrary function invocation from malicious pickle-based models, raising the bar for attackers to depend on code reuse techniques.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2508.15987

Country: North America > United States (1.00)

Genre: Research Report > New Finding (0.67)

Industry:

Leisure & Entertainment > Sports (1.00)
Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.87)

Add feedback

BAPFL: Exploring Backdoor Attacks Against Prototype-based Federated Learning

Zeng, Honghong, Lou, Jiong, Wang, Zhe, Zhou, Hefeng, Wu, Chentao, Zhao, Wei, Li, Jie

arXiv.org Artificial IntelligenceSep-17-2025

Prototype-based federated learning (PFL) has emerged as a promising paradigm to address data heterogeneity problems in federated learning, as it leverages mean feature vectors as prototypes to enhance model generalization. However, its robustness against backdoor attacks remains largely unexplored. In this paper, we identify that PFL is inherently resistant to existing backdoor attacks due to its unique prototype learning mechanism and local data heterogeneity. To further explore the security of PFL, we propose BAPFL, the first backdoor attack method specifically designed for PFL frameworks. BAPFL integrates a prototype poisoning strategy with a trigger optimization mechanism. The prototype poisoning strategy manipulates the trajectories of global prototypes to mislead the prototype training of benign clients, pushing their local prototypes of clean samples away from the prototypes of trigger-embedded samples. Meanwhile, the trigger optimization mechanism learns a unique and stealthy trigger for each potential target label, and guides the prototypes of trigger-embedded samples to align closely with the global prototype of the target label. Experimental results across multiple datasets and PFL variants demonstrate that BAPFL achieves a 35\%-75\% improvement in attack success rate compared to traditional backdoor attacks, while preserving main task accuracy. These results highlight the effectiveness, stealthiness, and adaptability of BAPFL in PFL.

artificial intelligence, machine learning, prototype, (15 more...)

arXiv.org Artificial Intelligence

2509.12964

Country: Asia > China (0.14)

Genre: Research Report (0.64)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.34)

Add feedback

ZORRO: Zero-Knowledge Robustness and Privacy for Split Learning (Full Version)

Sheybani, Nojan, Pegoraro, Alessandro, Knauer, Jonathan, Rieger, Phillip, Mollakuqe, Elissa, Koushanfar, Farinaz, Sadeghi, Ahmad-Reza

arXiv.org Artificial IntelligenceSep-15-2025

Split Learning (SL) is a distributed learning approach that enables resource-constrained clients to collaboratively train deep neural networks (DNNs) by offloading most layers to a central server while keeping in- and output layers on the client-side. This setup enables SL to leverage server computation capacities without sharing data, making it highly effective in resource-constrained environments dealing with sensitive data. However, the distributed nature enables malicious clients to manipulate the training process. By sending poisoned intermediate gradients, they can inject backdoors into the shared DNN. Existing defenses are limited by often focusing on server-side protection and introducing additional overhead for the server. A significant challenge for client-side defenses is enforcing malicious clients to correctly execute the defense algorithm. We present ZORRO, a private, verifiable, and robust SL defense scheme. Through our novel design and application of interactive zero-knowledge proofs (ZKPs), clients prove their correct execution of a client-located defense algorithm, resulting in proofs of computational integrity attesting to the benign nature of locally trained DNN portions. Leveraging the frequency representation of model partitions enables ZORRO to conduct an in-depth inspection of the locally trained models in an untrusted environment, ensuring that each client forwards a benign checkpoint to its succeeding client. In our extensive evaluation, covering different model architectures as well as various attack strategies and data scenarios, we show ZORRO's effectiveness, as it reduces the attack success rate to less than 6\% while causing even for models storing \numprint{1000000} parameters on the client-side an overhead of less than 10 seconds.

artificial intelligence, machine learning, zorro, (15 more...)

arXiv.org Artificial Intelligence

2509.09787

Country:

Asia (1.00)
North America > United States > California (0.93)
Europe (0.68)

Genre: Research Report (1.00)

Industry:

Information Technology > Security & Privacy (1.00)
Government > Regional Government > North America Government > United States Government (1.00)
Law (0.92)
Government > Military (0.87)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

CertDW: Towards Certified Dataset Ownership Verification via Conformal Prediction

Qiao, Ting, Li, Yiming, Li, Jianbin, Wang, Yingjia, Qi, Leyi, Guo, Junfeng, Feng, Ruili, Tao, Dacheng

arXiv.org Artificial IntelligenceJun-17-2025

Deep neural networks (DNNs) rely heavily on high-quality open-source datasets (e.g., ImageNet) for their success, making dataset ownership verification (DOV) crucial for protecting public dataset copyrights. In this paper, we find existing DOV methods (implicitly) assume that the verification process is faithful, where the suspicious model will directly verify ownership by using the verification samples as input and returning their results. However, this assumption may not necessarily hold in practice and their performance may degrade sharply when subjected to intentional or unintentional perturbations. To address this limitation, we propose the first certified dataset watermark (i.e., CertDW) and CertDW-based certified dataset ownership verification method that ensures reliable verification even under malicious attacks, under certain conditions (e.g., constrained pixel-level perturbation). Specifically, inspired by conformal prediction, we introduce two statistical measures, including principal probability (PP) and watermark robustness (WR), to assess model prediction stability on benign and watermarked samples under noise perturbations. We prove there exists a provable lower bound between PP and WR, enabling ownership verification when a suspicious model's WR value significantly exceeds the PP values of multiple benign models trained on watermark-free datasets. If the number of PP values smaller than WR exceeds a threshold, the suspicious model is regarded as having been trained on the protected dataset. Extensive experiments on benchmark datasets verify the effectiveness of our CertDW method and its resistance to potential adaptive attacks. Our codes are at \href{https://github.com/NcepuQiaoTing/CertDW}{GitHub}.

artificial intelligence, data mining, machine learning, (21 more...)

arXiv.org Artificial Intelligence

2506.1316

Country:

North America > United States > Maryland > Prince George's County > College Park (0.14)
Asia > Singapore (0.04)
Asia > China > Zhejiang Province > Hangzhou (0.04)
(2 more...)

Genre: Research Report (1.00)

Industry:

Information Technology > Security & Privacy (1.00)
Law (0.93)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Sensing and Signal Processing (0.92)
(2 more...)

Add feedback

Targeted Data Poisoning for Black-Box Audio Datasets Ownership Verification

Bouaziz, Wassim, El-Mhamdi, El-Mahdi, Usunier, Nicolas

arXiv.org Artificial IntelligenceMar-13-2025

Protecting the use of audio datasets is a major concern for data owners, particularly with the recent rise of audio deep learning models. While watermarks can be used to protect the data itself, they do not allow to identify a deep learning model trained on a protected dataset. In this paper, we adapt to audio data the recently introduced data taggants approach. Data taggants is a method to verify if a neural network was trained on a protected image dataset with top-$k$ predictions access to the model only. This method relies on a targeted data poisoning scheme by discreetly altering a small fraction (1%) of the dataset as to induce a harmless behavior on out-of-distribution data called keys. We evaluate our method on the Speechcommands and the ESC50 datasets and state of the art transformer models, and show that we can detect the use of the dataset with high confidence without loss of performance. We also show the robustness of our method against common data augmentation techniques, making it a practical method to protect audio datasets.

artificial intelligence, deep learning, machine learning, (14 more...)

arXiv.org Artificial Intelligence

2503.10269

Country:

Europe > France > Île-de-France > Paris > Paris (0.05)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > South Korea > Seoul > Seoul (0.04)

Genre: Research Report (0.66)

Industry: Information Technology > Security & Privacy (0.73)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

SafeSplit: A Novel Defense Against Client-Side Backdoor Attacks in Split Learning

Rieger, Phillip, Pegoraro, Alessandro, Kumari, Kavita, Abera, Tigist, Knauer, Jonathan, Sadeghi, Ahmad-Reza

arXiv.org Artificial IntelligenceJan-11-2025

Split Learning (SL) is a distributed deep learning approach enabling multiple clients and a server to collaboratively train and infer on a shared deep neural network (DNN) without requiring clients to share their private local data. The DNN is partitioned in SL, with most layers residing on the server and a few initial layers and inputs on the client side. This configuration allows resource-constrained clients to participate in training and inference. However, the distributed architecture exposes SL to backdoor attacks, where malicious clients can manipulate local datasets to alter the DNN's behavior. Existing defenses from other distributed frameworks like Federated Learning are not applicable, and there is a lack of effective backdoor defenses specifically designed for SL. We present SafeSplit, the first defense against client-side backdoor attacks in Split Learning (SL). SafeSplit enables the server to detect and filter out malicious client behavior by employing circular backward analysis after a client's training is completed, iteratively reverting to a trained checkpoint where the model under examination is found to be benign. It uses a two-fold analysis to identify client-induced changes and detect poisoned models. First, a static analysis in the frequency domain measures the differences in the layer's parameters at the server. Second, a dynamic analysis introduces a novel rotational distance metric that assesses the orientation shifts of the server's layer parameters during training. Our comprehensive evaluation across various data distributions, client counts, and attack scenarios demonstrates the high efficacy of this dual analysis in mitigating backdoor attacks while preserving model utility.

artificial intelligence, machine learning, safesplit, (16 more...)

arXiv.org Artificial Intelligence

doi: 10.14722/ndss.2025.241698

2501.0665

Country:

Europe (0.28)
North America > United States (0.28)
Asia (0.28)

Genre: Research Report (0.64)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

LoBAM: LoRA-Based Backdoor Attack on Model Merging

Yin, Ming, Zhang, Jingyang, Sun, Jingwei, Fang, Minghong, Li, Hai, Chen, Yiran

arXiv.org Artificial IntelligenceDec-2-2024

Model merging is an emerging technique that integrates multiple models fine-tuned on different tasks to create a versatile model that excels in multiple domains. This scheme, in the meantime, may open up backdoor attack opportunities where one single malicious model can jeopardize the integrity of the merged model. Existing works try to demonstrate the risk of such attacks by assuming substantial computational resources, focusing on cases where the attacker can fully fine-tune the pre-trained model. Such an assumption, however, may not be feasible given the increasing size of machine learning models. In practice where resources are limited and the attacker can only employ techniques like Low-Rank Adaptation (LoRA) to produce the malicious model, it remains unclear whether the attack can still work and pose threats. In this work, we first identify that the attack efficacy is significantly diminished when using LoRA for fine-tuning. Then, we propose LoBAM, a method that yields high attack success rate with minimal training resources. The key idea of LoBAM is to amplify the malicious weights in an intelligent way that effectively enhances the attack efficacy. We demonstrate that our design can lead to improved attack success rate through both theoretical proof and extensive empirical experiments across various model merging scenarios. Moreover, we show that our method has strong stealthiness and is difficult to detect.

attacker, fine-tuning, malicious model, (17 more...)

arXiv.org Artificial Intelligence

2411.16746

Country: North America > United States > New York > New York County > New York City (0.04)

Genre: Research Report (1.00)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.94)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Trojan Cleansing with Neural Collapse

Gu, Xihe, Fields, Greg, Jandali, Yaman, Javidi, Tara, Koushanfar, Farinaz

arXiv.org Artificial IntelligenceNov-19-2024

Trojan attacks are sophisticated training-time attacks on neural networks that embed backdoor triggers which force the network to produce a specific output on any input which includes the trigger. With the increasing relevance of deep networks which are too large to train with personal resources and which are trained on data too large to thoroughly audit, these training-time attacks pose a significant risk. In this work, we connect trojan attacks to Neural Collapse, a phenomenon wherein the final feature representations of over-parameterized neural networks converge to a simple geometric structure. We provide experimental evidence that trojan attacks disrupt this convergence for a variety of datasets and architectures. We then use this disruption to design a lightweight, broadly generalizable mechanism for cleansing trojan attacks from a wide variety of different network architectures and experimentally demonstrate its efficacy.

artificial intelligence, machine learning, neural collapse, (19 more...)

arXiv.org Artificial Intelligence

2411.12914

Country:

North America > United States > California > San Diego County > San Diego (0.04)
North America > United States > California > San Diego County > La Jolla (0.04)
Europe > Netherlands > North Holland > Amsterdam (0.04)

Genre: Research Report > New Finding (0.47)

Industry: Information Technology > Security & Privacy (0.95)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.95)

Add feedback

Analyzing Multi-Head Attention on Trojan BERT Models

Wang, Jingwei

arXiv.org Artificial IntelligenceJun-12-2024

Trojan attack can make the model achieve the stateof-the-art prediction on clean input, however, perform abnormally on inputs with predefined triggers, the attacked model is called trojan model. Fig 1 shows the trojan attack examples: if you only input the black font sentence (clean input), the trojan model will output the normal prediction label, modifies Layer-wise Relevance Propagation and while you insert the specific trigger (red font) to head confidence to indicate head importance on sentence, the trojan model will output the flipped translation task, but it's not the case on many other label.

benign model, prediction, trojan model, (14 more...)

arXiv.org Artificial Intelligence

2406.16925

Genre: Research Report (0.64)

Industry: Health & Medicine (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback